43 research outputs found

    Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE) rather than recursive feature elimination (RFE). We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE.</p> <p>Results</p> <p>We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs), a supervised machine learning classification method, to identify and score (rank) those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE) is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA) with recursive feature elimination (SVM-RFE and PDA-RFE) are used to remove genes based on their individual discriminant weights.</p> <p>Conclusion</p> <p>SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together provide greater insight into the structure of the microarray data. Clustering genes for classification appears to result in some concomitant clustering of samples into subgroups.</p> <p>Our present implementation of SVM-RCE groups genes using the correlation metric. The success of the SVM-RCE method in classification suggests that gene interaction networks or other biologically relevant metrics that group genes based on functional parameters might also be useful.</p> <p/

    Learning from positive examples when the negative class is undetermined- microRNA gene identification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The application of machine learning to classification problems that depend only on positive examples is gaining attention in the computational biology community. We and others have described the use of two-class machine learning to identify novel miRNAs. These methods require the generation of an artificial negative class. However, designation of the negative class can be problematic and if it is not properly done can affect the performance of the classifier dramatically and/or yield a biased estimate of performance. We present a study using one-class machine learning for microRNA (miRNA) discovery and compare one-class to two-class approaches using naïve Bayes and Support Vector Machines. These results are compared to published two-class miRNA prediction approaches. We also examine the ability of the one-class and two-class techniques to identify miRNAs in newly sequenced species.</p> <p>Results</p> <p>Of all methods tested, we found that 2-class naive Bayes and Support Vector Machines gave the best accuracy using our selected features and optimally chosen negative examples. One class methods showed average accuracies of 70–80% versus 90% for the two 2-class methods on the same feature sets. However, some one-class methods outperform some recently published two-class approaches with different selected features. Using the EBV genome as and external validation of the method we found one-class machine learning to work as well as or better than a two-class approach in identifying true miRNAs as well as predicting new miRNAs.</p> <p>Conclusion</p> <p>One and two class methods can both give useful classification accuracies when the negative class is well characterized. The advantage of one class methods is that it eliminates guessing at the optimal features for the negative class when they are not well defined. In these cases one-class methods can be superior to two-class methods when the features which are chosen as representative of that positive class are well defined.</p> <p>Availability</p> <p>The OneClassmiRNA program is available at: <abbrgrp><abbr bid="B1">1</abbr></abbrgrp></p

    Classification and Prediction of Survival in Patients with the Leukemic Phase of Cutaneous T Cell Lymphoma

    Get PDF
    We have used cDNA arrays to investigate gene expression patterns in peripheral blood mononuclear cells from patients with leukemic forms of cutaneous T cell lymphoma, primarily Sezary syndrome (SS). When expression data for patients with high blood tumor burden (Sezary cells >60% of the lymphocytes) and healthy controls are compared by Student's t test, at P < 0.01, we find 385 genes to be differentially expressed. Highly overexpressed genes include Th2 cells–specific transcription factors Gata-3 and Jun B, as well as integrin β1, proteoglycan 2, the RhoB oncogene, and dual specificity phosphatase 1. Highly underexpressed genes include CD26, Stat-4, and the IL-1 receptors. Message for plastin-T, not normally expressed in lymphoid tissue, is detected only in patient samples and may provide a new marker for diagnosis. Using penalized discriminant analysis, we have identified a panel of eight genes that can distinguish SS in patients with as few as 5% circulating tumor cells. This suggests that, even in early disease, Sezary cells produce chemokines and cytokines that induce an expression profile in the peripheral blood distinctive to SS. Finally, we show that using 10 genes, we can identify a class of patients who will succumb within six months of sampling regardless of their tumor burden

    Peripheral Immune Cell Gene Expression Predicts Survival of Patients with Non-Small Cell Lung Cancer

    Get PDF
    Prediction of cancer recurrence in patients with non-small cell lung cancer (NSCLC) currently relies on the assessment of clinical characteristics including age, tumor stage, and smoking history. A better prediction of early stage cancer patients with poorer survival and late stage patients with better survival is needed to design patient-tailored treatment protocols. We analyzed gene expression in RNA from peripheral blood mononuclear cells (PBMC) of NSCLC patients to identify signatures predictive of overall patient survival. We find that PBMC gene expression patterns from NSCLC patients, like patterns from tumors, have information predictive of patient outcomes. We identify and validate a 26 gene prognostic panel that is independent of clinical stage. Many additional prognostic genes are specific to myeloid cells and are more highly expressed in patients with shorter survival. We also observe that significant numbers of prognostic genes change expression levels in PBMC collected after tumor resection. These post-surgery gene expression profiles may provide a means to re-evaluate prognosis over time. These studies further suggest that patient outcomes are not solely determined by tumor gene expression profiles but can also be influenced by the immune response as reflected in peripheral immune cells

    Classification and biomarker identification using gene network modules and support vector machines

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes.</p> <p>We now demonstrate that an algorithm which integrates network information with recursive feature elimination based on SVM exhibits good performance and improves the biological interpretability of the results. We refer to the method as SVM with Recursive Network Elimination (SVM-RNE)</p> <p>Results</p> <p>Initially, one thousand genes selected by t-test from a training set are filtered so that only genes that map to a gene network database remain. The Gene Expression Network Analysis Tool (GXNA) is applied to the remaining genes to form <it>n </it>clusters of genes that are highly connected in the network. Linear SVM is used to classify the samples using these clusters, and a weight is assigned to each cluster based on its importance to the classification. The least informative clusters are removed while retaining the remainder for the next classification step. This process is repeated until an optimal classification is obtained.</p> <p>Conclusion</p> <p>More than 90% accuracy can be obtained in classification of selected microarray datasets by integrating the interaction network information with the gene expression information from the microarrays.</p> <p>The Matlab version of SVM-RNE can be downloaded from <url>http://web.macam.ac.il/~myousef</url></p

    Modulation of gene expression in heart and liver of hibernating black bears (Ursus americanus)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Hibernation is an adaptive strategy to survive in highly seasonal or unpredictable environments. The molecular and genetic basis of hibernation physiology in mammals has only recently been studied using large scale genomic approaches. We analyzed gene expression in the American black bear, <it>Ursus americanus</it>, using a custom 12,800 cDNA probe microarray to detect differences in expression that occur in heart and liver during winter hibernation in comparison to summer active animals.</p> <p>Results</p> <p>We identified 245 genes in heart and 319 genes in liver that were differentially expressed between winter and summer. The expression of 24 genes was significantly elevated during hibernation in both heart and liver. These genes are mostly involved in lipid catabolism and protein biosynthesis and include RNA binding protein motif 3 (<it>Rbm3</it>), which enhances protein synthesis at mildly hypothermic temperatures. Elevated expression of protein biosynthesis genes suggests induction of translation that may be related to adaptive mechanisms reducing cardiac and muscle atrophies over extended periods of low metabolism and immobility during hibernation in bears. Coordinated reduction of transcription of genes involved in amino acid catabolism suggests redirection of amino acids from catabolic pathways to protein biosynthesis. We identify common for black bears and small mammalian hibernators transcriptional changes in the liver that include induction of genes responsible for fatty acid β oxidation and carbohydrate synthesis and depression of genes involved in lipid biosynthesis, carbohydrate catabolism, cellular respiration and detoxification pathways.</p> <p>Conclusions</p> <p>Our findings show that modulation of gene expression during winter hibernation represents molecular mechanism of adaptation to extreme environments.</p

    Genome-Wide Maps of Circulating miRNA Biomarkers for Ulcerative Colitis

    Get PDF
    Inflammatory Bowel Disease – comprised of Crohn's Disease and Ulcerative Colitis (UC) - is a complex, multi-factorial inflammatory disorder of the gastrointestinal tract. In this study we have explored the utility of naturally occurring circulating miRNAs as potential blood-based biomarkers for non-invasive prediction of UC incidences. Whole genome maps of circulating miRNAs in micro-vesicles, Peripheral Blood Mononuclear Cells and platelets have been constructed from a cohort of 20 UC patients and 20 normal individuals. Through Significance Analysis of Microarrays, a signature of 31 differentially expressed platelet-derived miRNAs has been identified and biomarker performance estimated through a non-probabilistic binary linear classification using Support Vector Machines. Through this approach, classifier measurements reveal a predictive score of 92.8% accuracy, 96.2% specificity and 89.5% sensitivity in distinguishing UC patients from normal individuals. Additionally, the platelet-derived biomarker signature can be validated at 88% accuracy through qPCR assays, and a majority of the miRNAs in this panel can be demonstrated to sub-stratify into 4 highly correlated intensity based clusters. Analysis of predicted targets of these biomarkers reveal an enrichment of pathways associated with cytoskeleton assembly, transport, membrane permeability and regulation of transcription factors engaged in a variety of regulatory cascades that are consistent with a cell-mediated immune response model of intestinal inflammation. Interestingly, comparison of the miRNA biomarker panel and genetic loci implicated in IBD through genome-wide association studies identifies a physical linkage between hsa-miR-941 and a UC susceptibility loci located on Chr 20. Taken together, analysis of these expression maps outlines a promising catalog of novel platelet-derived miRNA biomarkers of clinical utility and provides insight into the potential biological function of these candidates in disease pathogenesis

    Mouse Chromosome 11

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/46996/1/335_2004_Article_BF00648429.pd

    Quantitation of transient gene expression after electroporation

    No full text
    corecore